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Abstract 

Online learning is a familiar problem setting within 
Machine-Learning in which data is presented serially 
in time to a learning agent, requiring it to progres¬ 
sively adapt within the constraints of the learning 
algorithm. More sophisticated variants may involve 
concepts such as transfer-learning which increase this 
adaptive capability, enhancing the learner’s cognitive 
capacities in a manner that can begin to imitate the 
open-ended learning capabilities of human beings. 

We shall argue in this paper, however, that a full 
realization of this notion requires that, in addition to 
the capacity to adapt to novel data, autonomous on¬ 
line learning must ultimately incorporate the capac¬ 
ity to update its own representational capabilities in 
relation to the data. We therefore enquire about the 
philosophical limits of this process, and argue that 
only fully embodied learners exhibiting an a priori 
perception-action link in order to ground representa¬ 
tional adaptations are capable of exhibiting the full 
range of human cognitive capability. 

keywords: philosophy of machine learning, 
perception-action learning, online learning 

1 Introduction 

In the following, we aim to circumscribe the inherent 
conceptual limits implicit in the notion of open-ended 
machine learning - a key criteria for human-like cog¬ 


nition and intelligence - and to propose a strategy for 
building autonomous agents capable of operating at 
this extremity of capability. We first commence with 
a brief statement of the problem. 

1.1 Conceptual Limits to Open- 
Ended Learning 

In Putnams’s classical ‘brain in a vat’ philosophical 
thought experiment, a brain is attached via wires to a 
supercomputer that simulates all aspects of the real 
world, mediating this in terms of electrical signals 
sent down the wires in response to input signals from 
the brain in the form of nerve impulses. The thought 
experiment is thus intended to address notions of rad¬ 
ical scepticism; could such a brain be justified in hav¬ 
ing true beliefs? (and would these be beliefs about 
objects existing within the simulated world or about 
the input/output characteristics of the electrical sig¬ 
nals). 

A variant of this thought experiment (in fact a sub¬ 
set of this thought experiment) might be the notion 
of a brain in a vat attached, from birth, only via its 
optic nerves to a video camera, in front of which pass, 
in a temporal sequence, all of the natural scenes of 
the world. The refined question is then ‘would such 
a brain represent the world in the same manner that 
a typical human would; one that is free to move and 
interact with the world’ ? 

It is implicitly argued within this paper that this 
is not the case; indeed, it will be argued that such 
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fundamental human perceptual characteristics as the 
delineation of the world into discrete objects (i.e. the 
delineation of entities that are invariant under trans¬ 
lation) would not occur to a non-embodied agent. 
Any such relational, non-essentialist notion of rep¬ 
resentation ([T]) has clear implications for artificial 
implementations of cognitive learning, which we shall 
explore in this paper. 

In order to evolve this argument, we first consider 
the problem of how machine learning takes place in 
temporalized environments, and address the inherent 
limitations on this form of learning. 

2 Limits to Standard Ap¬ 
proaches to Adaptive Online 
Learning 

Online learning mm) is a standard form of machine¬ 
learning induction in which data is presented serially 
in time, and in which learning generally takes place 
one instance at a time (it is thus the opposite of of¬ 
fline, or batch, learning). It is also inherently pre¬ 
dictive, predicting the label values of data not yet 
presented to the system. Such a system is thus in¬ 
herently adaptive; the degree of adaptation to new 
data will vary from on-line learner to on-line learner 
(sophisticated variants may incorporate notions such 
as transfer learning (ills]), anomaly detection ([6]), 
and active learning ([71[S])). 

Despite this tendency towards increasing adaptiv¬ 
ity, however, the majority of existing approaches typ¬ 
ically assume an underlying consistency in the rep¬ 
resentational characteristics of the data; the data- 
stream presented to an on-line learner is generally 
delineated in terms of a fixed set of classes, or a fixed 
set of features (for example, spatial interest points or 
texture-descriptors). Techniques exist that partially 
address these limitations, such as in online learners 
that incorporate Dirichlet processes to spawn novel 
states in relation to the requirements of the data ([S]), 
which are thus capable of expanding their represen¬ 
tational characteristics to a certain extent. However, 
such a learner would not be capable of spontaneously 
carrying out as fundamental a data-driven represen¬ 


tational shift as that involved in the transition from, 
say, a low-level feature-based representation of the 
world (delineated e.g. in terms of colored pixels) 
to an object-based representation of the world (de¬ 
lineated in terms of indexed entities with associated 
positions, orientations etc), unless a prior capacity 
for object representation had been incorporated into 
it. 

Taking the notion of autonomous adaptation to 
serial data to its conceptual limit would thus re¬ 
quire that both the representational capabilities of 
the learner as well its objective knowledge acquisition 
capabilities should be included in the autonomous 
learning process. Consider, for example, the case of 
an idealized autonomous online learning robot. Such 
an idealized online learner would thus be capable 
of spontaneously reparametrizing its representation 
of the world in relation to novel sensor data; i.e it 
must not just be capable of updating it’s model of 
the world, W, generated in terms of some particu¬ 
lar representational framework, R, (written i?[IT]), 
it must also be able to find an appropriate transfor¬ 
mation of its representational framework in order to 
‘most effectively represent’ the totality of the tempo¬ 
ral data, IT, via some appropriate criterion. It must 
thus perform the double mapping R[W] —^ i?'[IT'] 
(composed of the individual mappings R ^ R' and 
W —S’ W' ) such that the data W and W' are guar¬ 
anteed to both represent the same set of entities, 
as represented by a ‘noumenal equivalence’ predicate 
Equiv{R\W]^R'\W']) or similar (we shall return to 
this point later). 

In general, ‘this most effective’ representation cri¬ 
terion will be efficiency based - i.e. we will seek the 
mapping R\W] —>■ i?'[IT'] that minimizes complexity 
(via e.g an MDL ([ID]) or Occam’s Razor -like crite¬ 
rion). A motivation for this efficiency of representa¬ 
tion criterion can be found in Biosemantics m)- 
humans have adapted over millions of years for effi¬ 
ciency of their representative capability (in terms of 
either the overall neuronal budget or the total en¬ 
ergy of processing). However, there are other aspects 
to this natural selection of representative capabilities 
that must be considered (see section ([5])) 

Certain machine learning paradigms are inherently 
capable of the reparameterization R[W] —>■ R'\W'\, 
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for example, manifold learning techniques ('|12)1 and 
non-linear dimensionality reduction techniques ('|13|1. 
The technique adopted is not significant for our wider 
discussion; the key point is that following the pro¬ 
cess we arrive at both a reparameterization frame¬ 
work R' (such as an orthonormal basis in manifold or 
sub-manifold coordinates) and a revised data set de¬ 
scription W in the representational framework (e.g. 
following projection into the manifold coordinates). 
Typically, reparameterization will also involve a re¬ 
duction in the number of parameters required to rep¬ 
resent the data - i.e. the determination of some data- 
derived sub-manifold Mg necessarily implicates the 
existence of a projection operator such that the full 
range of data in the original domain, W C M, can 
be mapped into Mg - for instance, by collapsing data 
points along the orthogonal complement, W'^ {M is 
thus the original sensory manifold, and Mg the re¬ 
mapped representational framework if equipped with 
a suitable basis). 

Criteria for applying such a reductive reparam¬ 
eterization are many and varied; we might use 
a stochastically-motivated 2cr Eigenvalue cut-off in 
Principle Component Analysis to eliminate noise, for 
instance, or (more generically), we might use a model 
selection criterion such as the Akaike information cri¬ 
terion in order to arrive at a principled way to de¬ 
termine the allocation of manifold parameters in re¬ 
lation to the characterization of out-of-model data 
(the latter is related to minimum-description length 
(MDL) approaches, which in turn may be considered 
approximations of the ‘intrinsic’ (incomputable) Kol¬ 
mogorov Complexity of the observed data set). 

Whilst there might thus exist an intrinsic parame¬ 
terization of any given dataset when considered only 
in terms of the efficiency of representation, the ideal 
choice of representation will also, of necessity, de¬ 
pend on the purpose to which the data set is put. 
Thus, there will always be meta-reasons for the favor¬ 
ing of a particular data parameterization, a particu¬ 
lar representational framework. In fact, efficiency of 
representation is just such an extrinsic meta-reason; 
Kolmogorov Complexity is thus not in this sense an 
intrinsic measure of the data, but rather a measure 
driven by just one among several (potentially infinite) 
competing requirements for data representation, of 


which efficiency is only one. 

3 Identity Retention in Online 
Learning 

However, the above relates to batch processing of the 
data, and therefore makes the implicit assumption 
that all data points are derived from the same source, 
with perhaps only an instrumentally-irrelevant tem¬ 
poral delay between the collection of data points 
(that are otherwise independently and identically dis¬ 
tributed i.e. they are i.i.d). There is hence a strong 
assumption of ‘noumenal continuity’ implicit in non¬ 
online forms of learning. 

This assumption however, becomes complicated 
when considering an adaptive online learning, in 
which both the data and the data representation both 
have a temporal component. For instance, to give 
perhaps the simplest instance of this problem, in Si¬ 
multaneous Location and Mapping (SLAM) robotics 
(niiia), the robotic agent’s model of the world nec¬ 
essarily depends upon its calculation of its own po¬ 
sition and orientation in the world (i.e. it must fac¬ 
tor its own perspectival world-view into the world 
model). However, this positional calculation is itself 
dependant on (is relative to) the agent’s model of the 
world (i.e. the agent describes its own position and 
orientation in relation to the world model). A SLAM 
agent will therefore position itself in the world (per¬ 
haps using active learning (BSl) in order to minimize 
model ambiguity) by leveraging its own, uncertain 
model of the world. Interconnected ambiguities are 
thus always present in both the agent’s self-model (of 
its location/orientation) and it’s model of the world; 
the hope of SLAM robotics is that, following full ex¬ 
ploration of the environment, these ambiguities con¬ 
verge to within some manageable threshold. 

In general, the SLAM problem is not soluble unless 
certain a priori assumptions are made. A key such 
assumption is that the environment remains reason¬ 
ably consistent over time. If an environment were 
to undergo some arbitrary spatial transformation at 
each iteration of the SLAM algorithm, then no con¬ 
vergence would be possible (and in fact there would 
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be no meaning to the concept of world model). How¬ 
ever, much milder perturbations of the spatial do¬ 
main would be sufficient to ensure non-convergence 
of the algorithm. 

A further key a priori assumption, one that shall 
be particularly important in the following, but which 
is often overlooked, relates to the robotic agent’s mo¬ 
tor capabilities. The robotic agent’s motor capabil¬ 
ity may, in this case, be considered as that which 
initiates the change of perspective/change of repre¬ 
sentation. However, as such, it cannot in itself be 
doubted (unlike the world model), and must thus be 
assumed a priori. Colloquially, the agent might thus 
doubt it’s location, or its world model, but it can¬ 
not, if it is to work at all, doubt the fact that a spe¬ 
cific motor impulse has taken place (for instance, a 
‘move forward’ or ‘turn left’ command). The agent 
cannot converge on a world model if, for instance, 
motor impulses to the actuators underwent arbitrary 
permutation. Even non-arbitrary permutation would 
not be distinguishable, even in principle, from a cor¬ 
responding non-arbitrary permutation of the world 
space. (This non-distinguishability of perceptual ma¬ 
nipulations from motor manipulations is absolutely 
fundamental, and has important consequences in our 
later argument). 

Thus, both the world-model and the agent’s 
(orientation/position-based) self-model are inher¬ 
ently posited relative its motor impulses, which can 
be considered to represent the agent’s intentions in 
the sense that the existence of a specific intention is 
necessarily not itself open to doubt to the agent, how¬ 
ever uncertain its perceptual outcome. Model con¬ 
vergence on a complete world model occurs when the 
outcome of all actions leads to predictable percep¬ 
tual consequences (to within some given threshold). 
The agent has thus obtained a complete odometery 
of the environment (in human terms, we have ‘paced- 
out’ the domain). We can thus consider the world 
model as being mapped on to a grid of motor im¬ 
pulses such that, in a sense, the agent’s active capa¬ 
bilities provide the metric for its perceptual data (cf 
also (iniiiiiig)). 

In short, where there exists the capacity for up¬ 
dating the representational capacity of an agent in 
relation to perceptual data that it has sought on the 


basis of its original representation^ then we need some 
mechanism for guaranteeing that there is either suf¬ 
ficient a priori noumenal knowledge of the external 
world, or else sufficient a priori assumptions made 
regarding the process (e.g. movement) that initiates 
new data acquisition, in order for the representation¬ 
updating procedure to converge. Although this is 
problematic in SLAM, the problem is much more 
acute in fully open-ended learning scenarios where 
whole new categories of perception can be generated. 

Of course, in dealing with a priori requirements 
for perception in an empirical setting, the relevant 
philosopher is Kant - we now look more closely at 
this issue in Kantian terms. 

3.1 The Kantian Perspective on Cog¬ 
nitive Agency 

We are essentially, in the above, asking the question 
of how, in an adaptive online learning context, is it 
ever possible for us to empirically validate a proposed 
change to our representative capability (how is it, in 
a Popperian sense, possible to falsify a proposed rep¬ 
resentational update). Falsification of a world model 
is, by comparison, straightforward in a standard au¬ 
tonomous robotic system, in that a world model typ¬ 
ically constitutes a set of proposed haptic affordances 
([201 [n]) gathered at-a-distance by a vision system. 
Thus, the visual model typically denotes a set of ob¬ 
ject hypotheses that may be verified via haptic con¬ 
tact (I^El]). 

Haptic contact is thus typically considered to be 
prior to vision, or at least a priori less prone to am¬ 
biguity than vision. This is also experienced to an 
extent in human terms; we tend to consider some¬ 
thing that we can touch, but not see (for exam¬ 
ple, a ‘force field’) as intrinsically having substance, 
whereas something that we can see occupying volu¬ 
metric space but which we cannot verify by touch as 
being intrinsically illusory (a holographic image, for 
instance). 

However, in a hypothetical automaton where there 
exists complete representational fluidity, such that a 
completely novel sensorium could be developed (for 
instance by combining sonar data with visual data in 
some hybrid world description), then we cannot a pri- 


4 


ori favor one group of senses/sensors over another in 
order to delineate hypotheses about the world. More¬ 
over, there is no immediately obvious way to form 
hypotheses about the most appropriate representa¬ 
tional framework to adopt. 

In order to address this, we borrow a key insight 
from Kant; namely that object concepts constitute 
orderings of sensory intuitions ([H]). Objects, as we 
understand them do not thus constitute singular per¬ 
cepts, but rather synthetic unities built upon an a 
priori linkage that must be assumed between sensory 
intuitions and the external noumenal world (these a 
priori links cannot be in doubt since the are a con¬ 
dition of empirical validation for synthetic unities). 
Implicit in this is the notion that actions can be de¬ 
ployed to test the validity of these synthetic unities 
(which being synthetic rather than analytic are only 
contingently true, and therefore falsifiable through 
experience). Actions are thus causally initiated by 
the agent and serve to bring aspects of the synthetic 
unities to attention (within the a priori strata of 
space and time) in a way that renders them falsifi¬ 
able. 

For Kant, assuming that spatiality and temporal 
causality are a priori, means that they are assumed 
by the agent in order to have falsifiable perceptions at 
all; in principle, other ordering approaches to sensory 
data may be possible. However, it would be impos¬ 
sible for the agent to retain the continuity and fal- 
sifiability of object representation across such a fun¬ 
damental transition of representation (it would also 
be impossible for a self-conscious agent to retain its 
identity - or ‘synthetic unity of apperception’- across 
such a fundamental representational chasm). This is 
the problem of ‘noumenal continuity’ that we identi¬ 
fied earlier; how can an agent that undergoes a change 
of representation framework at time to ever be sure 
that the objects delineated at to~l were the same ob¬ 
jects as those delineated at to + 1 (indeed, would the 
number of objects even be preserved? A cognitive 
agent might, for instance, hypothesize a perceptual 
change in which the independent perceptual axes of 
color-awareness and shape-awareness were combined 
in a single-dimension unity, such that only one color 
was allowed per shape, with the corresponding in¬ 
ability to discriminate all of the objects previously 


discriminated). An online learner would therefore ap¬ 
pear to be severely limited in the extent to which it 
could utilize data across representational changes; in 
short the agent would no longer be a strictly online 
learner, but rather a serial batch-learner. 

However, there is one way in which novel repre¬ 
sentational changes can be made while retaining an 
agent’s ability to falsify both these as well as any ob¬ 
ject hypotheses (synthetic unities) formed in terms 
of these representational changes and, moreover, do 
so while retaining online continuity of object identity 
(when extended in perception-action terms -see be¬ 
low) . This is when representational changes are built 
hierarchically. 

By way of example, consider how, as humans we 
typically represent our environment when driving a 
vehicle. At one level, we internally represent the im¬ 
mediate environment in metric-related terms (i.e. we 
are concerned with our proximity to other road users, 
to the curb and so on ) (IIS]). At a higher level, 
however, we are concerned primarily with navigation- 
related entities (i.e how individual roads are con¬ 
nected). That the latter constitutes a higher hierar¬ 
chical level, both mathematically and experientially, 
is guaranteed by the fact that the topological repre¬ 
sentation subsumes, or supervenes upon, the metric 
representation; i.e. the metric-level provides addi¬ 
tional ‘fine-grained’ information to the road topology: 
the metric representation can be reduced to the topo¬ 
logical representation, but not vice versa. In robotics, 
when goals and sub-goals are explicitly delineated at 
each level, this is known as a subsumption hierarchy 

m)- 

In a fully adaptive online learner, it is thus possible 
to provide a grounded approach to representational 
induction by adopting a correspondingly hierarchical 
approach. Thus, on the assumption of the existence 
of an a priori means of validating low-level hypothe¬ 
ses (for example via haptic contact), it is possible 
to construct falsifiable higher-level representational 
hypothesis provided that these subsume the latter. 
Thus, for instance, an embodied autonomous robotic 
agent might, following active experimentation, spon¬ 
taneously conceive a high-level concept of affordance, 
or schema ([27]), such as that of container. Clearly, in 
this case, the notion container subsumes the concept 
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of haptic contact. 

Continuity of noumenal identity is thus guaran¬ 
teed by the lowest level of the hierarchy, with the 
higher hierarchical levels constituting progressive ab¬ 
stractions and enrichments of the lower level repre¬ 
sentations. An embodied autonomous robotic age 
might therefore initially represent the world in terms 
of (hypothetical) volume elements such as voxels or 
3d meshes (the a priori bootstrap representation), 
but, following extensive experimentation, might then 
go on to generate an enriched representation of its 
world at a higher level in which containers/and non¬ 
containers are delineated. (Note that the original rep¬ 
resentation of the world in volumetric terms is thus 
still present). 

Falsifiability of the representational concept ‘con¬ 
tainer’ is thus guaranteed, just as it is possible to 
guarantee the falsifiability of the hypothesis of the ex¬ 
istence of any specific container, by exploiting the fact 
that these hypotheses are grounded throughout the 
hierarchy. Thus, in the former case, the hypothesis 
of the existence of a specific container is rendered fal- 
sifiable by haptic contact (and its higher level corol¬ 
laries); i.e. the agent can test whether the proposed 
container-entity is, in fact, capable of containing an¬ 
other object. 

On the other hand, the high-level representational 
concept ‘container’ is rendered falsifiable by the fact 
that it is conceived along with a corresponding high- 
level action e.g. ‘placing an object into a container’ 
which necessarily subsumes lower-level concepts such 
as ‘haptic contact’ etc. Thus, the representational 
concept is rendered falsifiable on the basis of its utility 
and compressibility. 

To see how this works, suppose that an au¬ 
tonomous agent, on discovering by chance ex¬ 
ploratory activity (e.g. motor babbling ([35])), or 
via activity driven by lower-level action imperatives, 
that the previously defined concept ‘object’ yields an 
exception that allows for objects to be placed co- 
extantly in the same location as another object (the 
original concept ‘object’ assumes that objects placed 
on top of each other do not then co-exist in the same 
location). Further suppose that, on the basis of this 
exception, the object class is refined by the agent to 
accommodate the higher level notion of ‘container’ 


(i.e. so that the concept of ‘object’ subsumes the con¬ 
cept of ‘container’). This then constitutes a represen¬ 
tational hypothesis., which can be applied to the world 
(e.g. by training a classifier to distinguish container- 
objects from non-container-objects). 

The falsifiability of this concept then arises from 
actively addressing the question whether this higher- 
level perception of the world (as a series of objects 
in space that are either container-objects or non- 
container-objects) in fact constitutes a useful descrip¬ 
tion of the world i.e. whether it yields a net com¬ 
pression in the agent’s internal representation of its 
own possible interactions with the world (its affor- 
dances). Thus, if there were only a single container 
in the world, or if it were not possible to train an 
accurate classifier for containers in general, then it 
would be unlikely to constitute a useful description 
of the world; it would likely be more efficient sim¬ 
ply to retain the existing concept of object without 
modification. However, when the world is in fact con¬ 
stituted of objects for which it is an efficient compres¬ 
sion of the agent’s action capability to instigate such 
a modification of the object concept, then it is ap¬ 
propriate for a representationally-autonomous agent 
to spontaneously form a higher level of its represen¬ 
tational hierarchy. (For an example of this approach 
utilizing first-order logic induction see ( |29j 11. 

Very often compressibility will be predicated on 
the discovery of invariances in the existing percep¬ 
tual space with respect to randomized exploratory 
actions. Thus, for example, an agent might progress 
from a pixel-based representation of the world to an 
object-based representation of the world via the dis¬ 
covery that certain patches of pixels retain their (rel¬ 
ative) identity under translation, i.e. such that it 
becomes far more efficient to represent the world in 
terms of indexed objects rather than pixel intensities 
(though the latter would, of course, still constitute 
the base of the representational hierarchy). This par¬ 
ticular representational enhancement can represent 
an enormous compression f[30jb a pixel-based repre¬ 
sentation has a parametric magnitude of P” (with P 
and n being the intensity resolution and number of 
pixels, respectively), while an object-based represen¬ 
tation typically has a parametric magnitude of ^ n°, 
o « n, where o is the number of objects. 
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In positing this hierarchical approach to represen¬ 
tational adaptation, we have thus outlined a frame¬ 
work in which complete representational-autonomy 
for an embodied machine learner becomes feasible, 
one in which representations are empirically validat- 
able, and in which the ‘noumenal continuity’ of iden¬ 
tified entities can be assumed across representational 
transformations. 

A key aspect of this falsifiability is the requirement 
that the spontaneous generation of higher-level per¬ 
ceptions in the agent’s representational hierarchy cor¬ 
relates directly with higher level actions. We now 
look more closely at this perception-action connec¬ 
tion, and consider the low-level a priori guarantees 
of representative falsifiability. 

4 Perception-Action Learning 

Perception-Action learning is a novel paradigm in 
robotics that aims to address significant deficits in 
traditional approaches to embodied computer vision 
m)- In particular, in the conventional approach to 
autonomous robotics, a computer vision system will 
typically be employed to build a model of the agent’s 
environment prior to the act of planning the agent’s 
actions within the domain. Visual data arising from 
these actions will then typically be used to further 
constrain the environment model, either actively or 
passively (in active learning the agent actions are 
driven by the imperative of reducing ambiguity in 
the environment model). 

However, it is apparent that there exists in this 
approach, a very wide disparity between the visual 
parameterization of the agent’s domain and its ac¬ 
tion capabilities within it ([22]). For instance, the 
parametric freedom of a front-mounted camera will 
typically encompass the full intensity ranges of the 
Red, Green and Blue channels of each individual 
pixel of the camera CCD, such the the range of pos¬ 
sible images that might be generated in each time- 
frame is of an extremely large order of magnitude (of 
course, only a minuscule fraction of this representa¬ 
tional space is ever likely to be experienced by the 
agent). On the other hand, the agent’s motor capa¬ 
bility is likely to be very much more constrained (per¬ 


haps consisting of the possible Euler angle settings of 
the various actuator motors). This disparity leads di¬ 
rectly to the classical problems of framing ([33]) and 
symbol grounding ( [34] 1 (note that this observation is 
not limited purely to vision based approaches - alter¬ 
native modalities such as LIDAR and SONAR would 
also exhibit the same issues). 

Perception-Action (P-A) learning aims to over¬ 
come these issues by adopting as its motto, ‘action 
precedes perception’ f |351 133]b By this it is meant 
that, in a strict sense (to be defined), actions are 
conceptually prior to perceptions; i.e. that percep¬ 
tual capabilities should depend on action capabilities 
and not vice versa. 

Thus, a Perception-Action learning agent proceeds 
by randomly sampling its action space (‘motor bab¬ 
bling’). For each motor action that produces a dis¬ 
cernible perceptual output in the bootstrap represen¬ 
tation space S (consisting of e.g. camera pixels), a 
percept pi G S is greedily allocated. The agent thus 
progressively arrives at a set of novel percepts that 
relate directly to the agent’s action capabilities in re¬ 
lation to the constraints of the environment (i.e. the 
environment’s affordances)\ the agent learns to per¬ 
ceive only that which it can change. More accurately, 
the agent learns to perceive only that which it hypoth¬ 
esizes that it can change - thus, the set of experimen¬ 
tal data points UiPi C S can, in theory, be generalized 
over so as to create a percept-manifold that can be 
mapped onto the action space via e.g. the bijective 
relation {actions} {percepfinitiai} x {perceptunai} 
(i.e. such that each hypothesizable action has a 
unique, discriminable outcome) [221 (37] [33] . 

When such a perceptual manifold is created (rep¬ 
resenting a generalization over the tested space of ac¬ 
tion possibilities), this then permits an active sam¬ 
pling of the perceptual domain - the agent can pro¬ 
pose actions with perceptual outcomes that have not 
yet been experienced by the agent, but which are con¬ 
sistent with its current representational model (again, 
this guarantees falsifiability of the perceptual model). 
It is in this way that Perception-Action learning con¬ 
stitutes a form of active learning: randomized se¬ 
lection of perceptual goals within the hypothesized 
perception-action manifold leads more rapidly to the 
capture of data that might falsify the hypothesis than 
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would otherwise be the case (i.e. if the agent were 
performing randomly-selected actions within in the 
original motor domain). Thus, while the system is 
always ’motor babbling’ in a manner analogous to 
the learning process of infant humans, the fact of car¬ 
rying out this motor babbling in a higher-level P-A 
manifold means that the learning system as a whole 
more rapidly converges on the correct model of the 
world. 

Of course, this P-A motor-babbling activity can 
take place in any P-A manifold, of whatever level of 
abstraction; we may thus, by combining the idea of 
P-A learning with the notion of hierarchical represen¬ 
tation presented above, conceive of the notion of a hi¬ 
erarchical Perception-Action learner (ESI) , in which a 
vertical representation hierarchy is progressively con¬ 
structed for which randomized exploratory motor ac¬ 
tivity at the highest level of the corresponding motor 
hierarchy would rapidly converge on an ideal repre¬ 
sentation of the agent’s world in terms of its affor- 
dance potentialities. Such a system would thus con¬ 
verge upon both a model of the world, and an ideal 
strategy for representation of that world in terms of 
the learning agent’s action capabilities within it. 

Perceptual goals thus exist at all levels of the hier¬ 
archy, and the subsumptive nature of the hierarchy 
means that goals and sub-goals are scheduled with in¬ 
creasingly specific content as the high-level abstract 
goal is progressively grounded through the hierarchy. 
(Thus, as humans, we may conceive the high-level in¬ 
tention ‘drive to work’, which in order to be enacted, 
involves the execution of a large range of sub-goals 
with correspondingly lower-level perceptual goals e.g. 
‘stay in the center of the lane’, etc). 

We finally now look at how such a system for 
representational updating might have spontaneously 
evolved in humans, and how the wider question of 
representational fluidity fits into a biological context. 

5 The Biophilosophical Per¬ 
spective 

Biosemantics, as a sub-branch of Biophilosophy, was 
proposed by Millikan m) as an attempt to subsume 


certain philosophical questions of representation and 
perception within the purview of biology, and in par¬ 
ticular, the contingencies that arise from consistency 
with respect to natural selection. 

We have indicated earlier that a key notion of 
Biosemantics lies in motivating an efficiency of repre¬ 
sentation criterion; organisms are naturally-selected 
for efficiency of their representative capability in 
terms of either overall neuronal budget or total en¬ 
ergy of processing. 

However, a further aspect implicit in Biosemantics 
is the embodiment of the agent. Thus, the biological 
organism’s representative capability must, in addi¬ 
tion to being maximally or near-maximally efficient, 
also be of utility to the organism in perpetuating it’s 
genetic code (i.e. it must be consistent with Natural 
Selection) if it is to be consistently propagated. In 
practical terms, this means that the organism must 
be able to discriminate those entities (food, preda¬ 
tors, mates etc), that are key to its survival and re¬ 
production ([40]). However, the biological agent will 
also have acquired, by Natural Selection, an active 
capability that is likewise evolved to maximize the 
organism’s ability to propagate its genetic code; i.e. 
its ability to interact with the environment is adapted 
to maximize its survival and reproductive capability; 
a lobster’s claws are evolved for opening shells etc. 
The perceptual and the active capabilities of most 
organisms have thus evolved in lock-step; the organ¬ 
ism perceives only (since it must maximize efficiency 
of representation) that which is relevant to its sur¬ 
vival and reproduction in addition to that which it 
is capable of interacting with so as to maximize its 
survival and reproductive capability. 

However, this describes a biological entity with a 
fixed, evolved representational framework (whether 
in a natural or simulated environment f[41)ii. Hu¬ 
mans, however, have, to a larger degree than any 
other animal acquired the capacity to be able to re¬ 
configure their neuronal and perceptual structure in 
relation to the environment in ways that go far be¬ 
yond the immediate biological requirements. Thus, 
rather than both the organism’s representational 
framework, i?, and the organism’s active capability, 
A, having being adapted to the world, w, over time, 
in humans beings, the representation framework is 


capable of adapting directly to the world, w. (Which 
is not to say that the possibility of reconfiguration 
does not serve our biological ends, simply that any 
particular reconfiguration occurs in relation to the 
biologically-experienced facts of the world, and is not 
itself naturally-selected for optimality with respect to 
the organism’s long-term capacity survival). These 
perceptual reconfigurations can be very abstract; we 
can thus, for instance perceive the world in terms of 
the interaction between socio-economic groups if we 
are an economist, or in aesthetic terms if we are an 
artist. (Note that we are not suggesting that humans 
are able to update all of their representative capacity, 
only some significant fraction of it). 

This reconfigurability of human perceptual struc¬ 
ture in relation to the environment makes it critically 
different from the more usual naturally-selected per¬ 
ceptual capability found amongst other organisms. 
There is, in particular, no immediate survival imper¬ 
ative attached to perceptual reconfiguration, other 
than by proxy (for instance there may be a con¬ 
straint on the total neuronal/energy budget involved 
in the perceptual reconfiguration). However, such 
‘budgetary’ proxies for the requirements of natural 
selection are not, in themselves, sufficient to motivate 
any particular reconfiguration of the human agent’s 
perceptual capability - for this we need an additional 
proxy criterion, one that leads to a ‘retention of ac¬ 
tive capability’ (if neuronal efficiency alone were the 
criterion for perceptual updating, then it would al¬ 
ways be optimal to map increasing numbers of the 
original percepts on to a singular novel percept). 

The two principle (non-naturally-selected) opera¬ 
tive criteria for perceptual updating in humans are 
thus: 

1. Obtaining a maximally efficient representation of 
the environment 

in combination with: 

2. Ensuring the discriminability of the active capa¬ 
bilities of the agent, as well as key entities related to 
survival/reproduction/nutrition. 

By the ‘discriminability of the active capabilities’ 
in the latter constraint, we mean the ability to per¬ 
ceive the outcomes of intended actions undertaken 


by the agent i.e. an intentional action (or at least 
one initiated by the goal-setting aspect of the agent’s 
cognition), should be susceptible to the sensory deter¬ 
mination of its having taken place as intended. (In 
straightforward terms we might say that an ‘inten¬ 
tional action’ is that which has a specific percept as 
its success criterion.) 

It is thus clear, whether considered from an a pri¬ 
ori Kantian, or an a posteriori Biophilosophical per¬ 
spective, that perceptions and actions must retain a 
fundamental link in any representationally-adaptive 
online learning system capable of emulating human 
cognitive capabilities. 

6 Conclusion 

We have thus proposed hierarchical perception-action 
learning as the idealized form of adaptive online¬ 
learning, which, by virtue of its embodiment within 
the environment, is able to empirically validate both 
its model of the world and its representation of the 
world. 

An important corollary of this approach is that, 
at no stage, is there any requirement for global hi¬ 
erarchical consistency of representation (thus, as hu¬ 
mans, we do not carry around within us a set of exact 
Cartesian coordinate locations of the key elements of 
our native town; rather what we retain is a series 
of motor imperatives to be triggered in relation to 
key percepts: e.g. ‘turn left at the town-hall’). In a 
sense, in a Perception-Action learning agent, “the en¬ 
vironment has become it own representation”, m), 
which naturally represents a very significant compres¬ 
sion of the information that an agent needs to retain. 

This relates to the issue of symbol grounding, a 
seminal problem in the conceptual underpinning of 
the classical approach to machine learning ( [33] ) ■ The 
problem arises when one attempts to relate an ab¬ 
stract symbol manipulation system (it was a common 
historical assumption that computational reasoning 
would center on hrst-order logic deduction) with the 
stochastic, shifting reality of sensor data. In hier¬ 
archical P-A learning the problem is eliminated by 
virtue of the fact that representations are abstracted 
from the bottom-up ([4311441 HU ESI). They are thus 
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always intrinsically grounded (indeed this grounding 
is the main guarantor of their falsifiability). 

We finally note that motor-babbling at the top 
of the representation hierarchy would necessarily in¬ 
volve the spontaneous scheduling of perceptual goals 
and sub-goals at the lower level of the hierarchy 
in a way that would (as the hierarchy becomes 
deeper) necessarily look increasingly ’intentional’ (a 
phenomenon that is readily apparent in the develop¬ 
ment of motor movement of human infants). 

Hierarchical P-A learning would therefore seem the 
natural direction of progress in embodied adaptive 
online learning. The question then arises of how this 
would apply if the embodiment that guarantees the 
falsifiability of representational updating is in a do¬ 
main that is not directly physical e.g. when the adap¬ 
tive online learner is, for example, a web-crawling 
robot (indeed, what does ‘embodiment’ mean in this 
context?). Could such an agent spontaneously adapt 
to perceive high level concepts in, for example, html 
data while retaining the integrity of its underlying 
‘motor space’? 

The answer, in this case, hinges on the fact that the 
agent’s actions are the searching and indexing actions 
undertaken by the robot; it is embodied in so far as 
it has a location with respect to these action capabil¬ 
ities. At the lowest (a priori) level there is thus the 
basic ability to move between web-pages; this capa¬ 
bility cannot, under any circumstances be altered by 
the agent. It is, however, quite free to spontaneously 
form higher level search and index capabilities built 
upon these, for example by meta-indexing documents 
in terms of discovered higher-level subject-matters. 
The agent is thus capable of complete flexibility of 
hierarchical representation with respect to the falsifi¬ 
ability constraints that we have outlined, and is thus 
a fully-constituted hierarchical P-A learner. 

The proposed framework is thus one of very general 
applicability, and one which, we believe has the po¬ 
tential to address the fundamental conceptual deficits 
in standard notions of adaptive online learning that 
we have outlined. 


References 

[1] L. Wittgenstein, Philosophical investigations : 
the German text with a revised English trans¬ 
lation by Ludwig Wittgenstein. Oxford : Black- 
well, 2001. 

[2] J.-B. Pothin and C. Richard, “Online learning 
with kernels a new approach for sparsity con¬ 
trol based on a coherence criterion,” in Machine 
Learning for Signal Processing, 2006. Proceed¬ 
ings of the 2006 16th IEEE Signal Processing 
Society Workshop on, Sept 2006, pp. 241-245. 

[3] J. Kivinen, A. Smola, and R. Williamson, “On¬ 
line learning with kernels,” Signal Processing, 
IEEE Transactions on, vol. 52, no. 8, pp. 2165- 
2176, Aug 2004. 

[4] S. J. Pan and Q. Yang, “A survey on trans¬ 
fer learning,” Knowledge and Data Engineering, 
IEEE Transactions on, vol. 22, no. 10, pp. 1345- 
1359, 2010. 

[5] M. E. Taylor and P. Stone, “Transfer learning for 
reinforcement learning domains: A survey,” The 
Journal of Machine Learning Research, vol. 10, 
pp. 1633-1685, 2009. 

[6] V. Chandola, A. Banerjee, and V. Kumar, 
“Anomaly detection: A survey,” ACM Comput¬ 
ing Surveys (CSUR), vol. 41, no. 3, p. 15, 2009. 

[7] B. Settles, “Active learning literature survey,” 
University of Wisconsin, Madison, 2010. 

[8] V. Koltchinskii, “Rademacher complexities and 
bounding the excess risk in active learning,” The 
Journal of Machine Learning Research, vol. 11, 
pp. 2457-2485, 2010. 

[9] M. Hoffman, D. M. Blei, and F. Bach, “On¬ 
line learning for latent dirichlet allocation,” Ad¬ 
vances in Neural Information Processing Sys¬ 
tems, vol. 23, pp. 856-864, 2010. 

[10] J. Rissanen, Minimum description length prin¬ 
ciple. Springer, 2010. 


10 



[11] R. G. Millikan, Language, Thought, and Other 
Biological Categories: New Foundations for Re¬ 
alism. The MIT Press; Reprint edition, Decem¬ 
ber 1987. 

[12] Z. Zhang, J. Wang, and H. Zha, “Adaptive man¬ 
ifold learning,” Pattern Analysis and Machine 
Intelligence, IEEE Transactions on, vol. 34, 
no. 2, pp. 253-265, 2012. 

[13] M. Debruyne, M. Hubert, and J. Van Hore- 
beek, “Detecting influential observations in ker¬ 
nel pea,” Computational Statistics & Data Anal¬ 
ysis, vol. 54, no. 12, pp. 3007-3019, 2010. 

[14] N. Engelhard, F. Endres, J. Hess, J. Sturm, and 
W. Burgard, “Real-time 3d visual slam with a 
hand-held rgb-d camera,” in Proc. of the RGB- 
D Workshop on 3D Perception in Robotics at 
the European Robotics Forum, Vasteras, Sweden, 
vol. 2011, 2011. 

[15] H. Strasdat, J. Montiel, and A. Davison, “Scale 
drift-aware large scale monocular slam,” in 
Proceedings of Robotics: Science and Systems 
(RSS), vol. 2, no. 3, 2010, p. 5. 

[16] N. Fairfield and D. Wettergreen, “Active slam 
and loop prediction with the segmented map 
using simplified models,” in Field and Service 
Robotics: Results of the 7th International Con¬ 
ference, vol. 62. Springer, 2010, p. 173. 

[17] J. Dewey, “The reflex arc concept in psychol¬ 
ogy,” The Psychological Review, no. 3, pp. 356- 
370, 1896. 

[18] A. Glenberg, “What memory is for,” Behavioral 
and Brain Sciences, vol. 20, no. 1, pp. 1-55, 
1997. 

[19] G. Lakoff and M. Johnson, Philosophy in the 
Flesh : The Embodied Mind and Its Challenge 
to Western Thought. Harper Collins Publish¬ 
ers, 1999. 

[20] J. J. Gibson, The ecological approach to visual 
perception. Boston: Houghton-Mifflin, 1979. 


[21] J. McGrenere and W. Ho, “Affordances: Clar¬ 
ifying and evolving a concept,” in Proceedings 
of Graphics Interface 2000, Montreal, Canada, 
2000, pp. 179-186. 

[22] J. Saunders and D. C. Knill, “Visual feedback 
control of hand movements,” J. of Neuroscience, 
vol. 24, no. 13, pp. 3223-3234, 2004. 

[23] E. J. Schlicht and P. R. Schrater, “Bayesian 
model for reaching and grasping peripheral and 
occluded targets,” Journal of Vision, vol. 3, 
no. 9, p. 261, 2003. 

[24] I. Kant, Critique of Pure Reason, A. W. W. 
Paul Guyer, Ed. Cambridge University Press, 
1999. 

[25] D. Windridge, A. Shaukat, and E. Hollnagel, 
“Characterizing driver intention via hierarchical 
perception-action modeling,” Human-Machine 
Systems, IEEE Transactions on, vol. 43, no. 1, 
pp. 17-31, 2013. 

[26] R. A. Brooks, “Intelligence without representa¬ 
tion,” Artificial Intelligence, vol. 47, pp. 139- 
159, 1991. 

[27] D. L. Hintzman, “Schema abstraction in a 
multiple-trace memory model,” Psychological re¬ 
view, vol. 93, no. 4, pp. 411-428, 1986. 

[28] J. Modayil and B. Kuipers, “Autonomous de¬ 
velopment of a grounded object ontology by a 
learning robot,” in Proceedings of the national 
conference on Artificial intelligence, vol. 22, 
no. 2. Menlo Park, CA; Cambridge, MA; Lon¬ 
don; AAAI Press; MIT Press; 1999, 2007, p. 
1095. 

[29] D. Windridge and J. Kittler, “Perception-action 
learning as an epistemologically-consistent 
model for self-updating cognitive representa¬ 
tion,” in Brain Inspired Cognitive Systems 2008. 
Springer, 2010, pp. 95-134. 

[30] J. G. Wolff, “Cognitive development as optimi¬ 
sation,” in Computational Models of Learning, 
L. Bole, Ed. Heidelberg: Springer-Verlag, 1987, 
pp. 161-205. 


11 



[31] H. Dreyfus, What Computers Can’t Do. New 
York: Harper and Row, 1972. 

[32] C. L. Nehaniv, D. Polani, K. Dautenhahn, 
R. te Boekhorst, and L. Canamero, “Meaningful 
information, sensor evolution, and the tempo¬ 
ral horizon of embodied organisms,” in Artificial 
Life VIII, B. Standish, Abbass, Ed. MIT Press, 
2002, pp. 345-349. 

[33] J. McCarthy and P. Hayes, “Some philosophical 
problems from the standpoint of artificial intelli¬ 
gence,” Machine Intelligence, no. 4, pp. 463-502, 
1969. 

[34] S. Harnad, “The symbol grounding problem,” 
Physica D, no. 42, pp. 335-346, 1990. 

[35] G. Granlund, “Organization of architectures 
for cognitive vision systems,” in Proceedings 
of Workshop on Cognitive Vision, Schloss 
Dagstuhl, Germany, 2003. 

[36] M. Felsberg, J. Wiklund, and G. Granlund, “Ex¬ 
ploratory learning structures in artihcial cog¬ 
nitive systems,” Image and Vision Computing, 
vol. 27, no. 11, pp. 1671-1687, 2009. 

[37] D. Windridge and J. Kittler, “Epistemic con¬ 
straints on autonomous symbolic representa¬ 
tion in natural and artihcial agents,” in Studies 
in Computational Intelligence: Applications of 
Computational Intelligence in Biology. Springer 
Berlin Heidelberg, 2008, vol. 122, pp. 395-422. 

[38] D. Windridge, M. Eelsberg, and A. Shaukat, 
“A framework for hierarchical perception-action 
learning utilizing fuzzy reasoning,” Cybernetics, 
IEEE Transactions on, vol. 43, no. 1, pp. 155- 
169, Eeb 2013. 

[39] M. Shevchenko, D. Windridge, and J. Kittler, 
“A linear-complexity reparameterisation strat¬ 
egy for the hierarchical bootstrapping of capa¬ 
bilities within perception-action architectures,” 
Image and Vision Computing, vol. 27, no. 11, 
pp. 1702-1714, 2009. 


[40] J. Piaget, Genetic Epistemology. New York: 
Columbia University Press, 1970. 

[41] M. Sipper, “An introduction to artihcial life.” 
Explorations in Artificial Life (special issue of 
AI Expert), pp. 4-8, September 1995. 

[42] A. Newell and H. Simon, “The theory of human 
problem solving; reprinted in collins & smith 
(eds.),” in Readings in Cognitive Science, sec¬ 
tion 1.3., 1976. 

[43] D. Marr, Vision: A Computational Approach. 
San Fr.: Freeman & Co., 1982. 

[44] P. Gardenfors, “How logic emerges from the dy¬ 
namics of information,” Logic and Information 
Elow, pp. 49-77, 1994. 

[45] J. Modayil, “Bootstrap learning a perceptually 
grounded object ontology,” 2005, retr. 9/5/2005 
http: / / WWW. cs. utexas.edu / users / modayil / modayil- 
proposal.pdf. 


12 



